<% component = metadata.sinks.aws_s3 %>

<%= component_header(component) %>

## Config File

<%= component_config_example(component) %>

## Options

<%= options_table(component.options.to_h.values.sort) %>

## Examples

The `aws_s3` sink batches [`log`][docs.log_event] up to the `batch_size` or
`batch_timeout` options. When flushed, Vector will write to [AWS S3][url.aws_s3]
via the [`PutObject` API
endpoint](https://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectPUT.html).
The encoding is dictated by the `encoding` option. For example:

```http
POST / HTTP/1.1
Host: kinesis.<region>.<domain>
Content-Length: <byte_size>
Content-Type: application/x-amz-json-1.1
Connection: Keep-Alive 
X-Amz-Target: Kinesis_20131202.PutRecords
{
    "Records": [
        {
            "Data": "<base64_encoded_event>",
            "PartitionKey": "<partition_key>"
        },
        {
            "Data": "<base64_encoded_event>",
            "PartitionKey": "<partition_key>"
        },
        {
            "Data": "<base64_encoded_event>",
            "PartitionKey": "<partition_key>"
        },
    ],
    "StreamName": "<stream_name>"
}
```

## How It Works [[sort]]

<%= component_sections(component) %>

### Columnar Formats

Vector has plans to support column formats, such as ORC and Parquet, in
[`v0.6`][url.roadmap].

### Object Naming

By default, Vector will name your S3 objects in the following format:

{% code-tabs %}
{% code-tabs-item title="no compression" %}
```
<key_prefix><timestamp>-<uuidv4>.log
```
{% endcode-tabs-item %}
{% code-tabs-item title="gzip" %}
```
<key_prefix><timestamp>-<uuidv4>.log.gz
```
{% endcode-tabs-item %}
{% endcode-tabs %}

For example:

{% code-tabs %}
{% code-tabs-item title="no compression" %}
```
date=2019-06-18/1560886634-fddd7a0e-fad9-4f7e-9bce-00ae5debc563.log
```
{% endcode-tabs-item %}
{% code-tabs-item title="gzip" %}
```
date=2019-06-18/1560886634-fddd7a0e-fad9-4f7e-9bce-00ae5debc563.log.gz
```
{% endcode-tabs-item %}
{% endcode-tabs %}

Vector appends a [UUIDV4][url.uuidv4] token to ensure there are no name
conflicts in the unlikely event 2 Vector instances are writing data at the same
time.

You can control the resulting name via the `key_prefix`, `filename_time_format`,
and `filename_append_uuid` options.

### Searching

Storing log data in S3 is a powerful strategy for persisting log data. Mainly
because data on S3 is searchable. And [AWS Athena][url.aws_athena] makes this
easier than ever.

#### Athena

1. Head over to the [Athena console][url.aws_athena_console].

2. Create a new table, replace the `<...>` variables as needed:

    ```sql
    CREATE EXTERNAL TABLE logs (
      timestamp string,
      message string,
      host string
    )   
    PARTITIONED BY (date string)
    ROW FORMAT  serde 'org.apache.hive.hcatalog.data.JsonSerDe'
    with serdeproperties ( 'paths'='timestamp, message, host' )
    LOCATION 's3://<region>.<key_prefix>';
    ```

3. Discover your partitions by running the following query:

    ```sql
    MSCK REPAIR TABLE logs
    ```

4. Query your data:

    ```sql
    SELECT host, COUNT(*)
    FROM logs
    GROUP BY host
    ```

Vector has plans to support [columnar formats](#columnar-formats) in
[`v0.6`][url.roadmap] which will allows for very fast and efficient querying on
S3.

## Troubleshooting

<%= component_troubleshooting(component) %>

## Resources

<%= component_resources(component) %>